Linear Model Selection by Cross-Validation
نویسندگان
چکیده
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].. American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the Cp, and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n-s o. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n,-out cross-validation with nv, the number of observations reserved for validation, satisfying no/n-1 I as n s* xoo. This is a somewhat shocking discovery, because ne/n-* 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n,-out cross-validation method are provided, and results from a simulation study are presented.
منابع مشابه
On Cross Validation for Model Selection
In response to Zhu and Rower (1996), a recent communication (Goutte, 1997) established that leave-one-out cross validation is not subject to the "no-free-lunch" criticism. Despite this optimistic conclusion, we show here that cross validation has very poor performances for the selection of linear models as compared to classic statistical tests. We conclude that the statistical tests are prefera...
متن کاملModel selection for linear classifiers using Bayesian error estimation
Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model is determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be v...
متن کاملConsistency Properties of Model Selection Criteria in Multiple Linear Regression
This paper concerns the asymptotic properties of a class of criteria for model selection in linear regression models, which covers the most well known criteria as e.g. MALLOWS' Cp, CV (cross-validation), GCV ( generalized cross-validation), AKAIKE's AIC and FPE as well as SCHWARZ' BIC. These criteria are shown to be consistent in the sense of selecting the true or larger models, assuming i.i.d....
متن کاملAsymptotic optimality of full cross-validation for selecting linear regression models
For the problem of model selection, full cross-validation has been proposed as alternative criterion to the traditional cross-validation, particularly in cases where the latter one is not well deened. To justify the use of the new proposal we show that under some conditions, both criteria share the same asymptotic optimality property when selecting among linear regression models.
متن کاملStructure parameter estimation algorithms for model selection
This paper presents deterministic and stochastic algorithms of the structure parameters estimation for the model selection problem. Structure parameters optimization for linear and non-linear models is investigated. The optimized error function is inferred from statistical hypothesis on the model parameter distributions. Analytic algorithms are based on the error function derivatives estimation...
متن کامل